buggy line
- North America > United States (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
- Instructional Material (0.68)
- Research Report (0.46)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
The Impact of Fine-tuning Large Language Models on Automated Program Repair
Macháček, Roman, Grishina, Anastasiia, Hort, Max, Moonen, Leon
--Automated Program Repair (APR) uses various tools and techniques to help developers achieve functional and error-free code faster . In recent years, Large Language Models (LLMs) have gained popularity as components in APR tool chains because of their performance and flexibility. However, training such models requires a significant amount of resources. Fine-tuning techniques have been developed to adapt pre-trained LLMs to specific tasks, such as APR, and enhance their performance at far lower computational costs than training from scratch. In this study, we empirically investigate the impact of various fine-tuning techniques on the performance of LLMs used for APR. Our experiments provide insights into the performance of a selection of state-of-the-art LLMs pre-trained on code. The evaluation is done on three popular APR benchmarks (i.e., QuixBugs, Defects4J and HumanEval-Java) and considers six different LLMs with varying parameter sizes (resp. We consider three training regimens: no fine-tuning, full fine-tuning, and parameter-efficient fine-tuning (PEFT) using LoRA and IA3. We observe that full fine-tuning techniques decrease the benchmarking performance of various models due to different data distributions and overfitting. By using parameter-efficient fine-tuning methods, we restrict models in the amount of trainable parameters and achieve better results. Index T erms --large language models, automated program repair, parameter-efficient fine-tuning, AI4Code, AI4SE, ML4SE. Software development, maintenance, and evolution are expensive processes, both in terms of money and time [1]. Supporting the efficiency of software engineers responsible for these workflows can save significant resources and enable companies to speed up the production and delivery of products without loss of quality. One of the key challenges that software engineers face is the occurrence of software defects, or bugs, which are unintended errors in the code that cause deviations from expected behavior. These defects vary in complexity, from simple one-line syntax errors to intricate multi-line logic bugs that can span multiple files and components. Automated Program Repair (APR) aims to support developers with the software maintenance and evolution process, by helping them to fix any bugs they encounter and achieve their goals faster.
- Europe > Norway > Eastern Norway > Oslo (0.04)
- Europe > Switzerland > Bern > Bern (0.04)
- Europe > Netherlands (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.87)
Location is Key: Leveraging Large Language Model for Functional Bug Localization in Verilog
Yao, Bingkun, Wang, Ning, Zhou, Jie, Wang, Xi, Gao, Hong, Jiang, Zhe, Guan, Nan
Bug localization in Verilog code is a crucial and time-consuming task during the verification of hardware design. Since introduction, Large Language Models (LLMs) have showed their strong programming capabilities. However, no work has yet considered using LLMs for bug localization in Verilog code. This paper presents Location-is-Key, an opensource LLM solution to locate functional errors in Verilog snippets. LiK achieves high localization accuracy, with a pass@1 localization accuracy of 93.3% on our test dataset based on RTLLM, surpassing GPT-4's 77.9% and comparable to Claude-3.5's 90.8%. Additionally, the bug location obtained by LiK significantly improves GPT-3.5's bug repair efficiency (Functional pass@1 increased from 40.39% to 58.92%), highlighting the importance of bug localization in LLM-based Verilog debugging. Compared to existing methods, LiK only requires the design specification and the erroneous code snippet, without the need for testbenches, assertions, or any other EDA tools. This research demonstrates the feasibility of using LLMs for Verilog error localization, thus providing a new direction for automatic Verilog code debugging.
- North America > United States > District of Columbia > Washington (0.05)
- Asia > China > Hong Kong (0.05)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- (2 more...)
RAPGen: An Approach for Fixing Code Inefficiencies in Zero-Shot
Garg, Spandan, Moghaddam, Roshanak Zilouchian, Sundaresan, Neel
Performance bugs are non-functional bugs that can even manifest in well-tested commercial products. Fixing these performance bugs is an important yet challenging problem. In this work, we address this challenge and present a new approach called Retrieval-Augmented Prompt Generation (RAPGen). Given a code snippet with a performance issue, RAPGen first retrieves a prompt instruction from a pre-constructed knowledge-base of previous performance bug fixes and then generates a prompt using the retrieved instruction. It then uses this prompt on a Large Language Model (such as Codex) in zero-shot to generate a fix. We compare our approach with the various prompt variations and state of the art methods in the task of performance bug fixing. Our evaluation shows that RAPGen can generate performance improvement suggestions equivalent or better than a developer in ~60% of the cases, getting ~39% of them verbatim, in an expert-verified dataset of past performance changes made by C# developers.
- North America > United States > Washington > King County > Redmond (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (2 more...)
How Effective Are Neural Networks for Fixing Security Vulnerabilities
Wu, Yi, Jiang, Nan, Pham, Hung Viet, Lutellier, Thibaud, Davis, Jordan, Tan, Lin, Babkin, Petr, Shah, Sameena
Security vulnerability repair is a difficult task that is in dire need of automation. Two groups of techniques have shown promise: (1) large code language models (LLMs) that have been pre-trained on source code for tasks such as code completion, and (2) automated program repair (APR) techniques that use deep learning (DL) models to automatically fix software bugs. This paper is the first to study and compare Java vulnerability repair capabilities of LLMs and DL-based APR models. The contributions include that we (1) apply and evaluate five LLMs (Codex, CodeGen, CodeT5, PLBART and InCoder), four fine-tuned LLMs, and four DL-based APR techniques on two real-world Java vulnerability benchmarks (Vul4J and VJBench), (2) design code transformations to address the training and test data overlapping threat to Codex, (3) create a new Java vulnerability repair benchmark VJBench, and its transformed version VJBench-trans and (4) evaluate LLMs and APR techniques on the transformed vulnerabilities in VJBench-trans. Our findings include that (1) existing LLMs and APR models fix very few Java vulnerabilities. Codex fixes 10.2 (20.4%), the most number of vulnerabilities. (2) Fine-tuning with general APR data improves LLMs' vulnerability-fixing capabilities. (3) Our new VJBench reveals that LLMs and APR models fail to fix many Common Weakness Enumeration (CWE) types, such as CWE-325 Missing cryptographic step and CWE-444 HTTP request smuggling. (4) Codex still fixes 8.3 transformed vulnerabilities, outperforming all the other LLMs and APR models on transformed vulnerabilities. The results call for innovations to enhance automated Java vulnerability repair such as creating larger vulnerability repair training data, tuning LLMs with such data, and applying code simplification transformation to facilitate vulnerability repair.
- North America > Canada > Alberta (0.14)
- North America > United States > Washington > King County > Seattle (0.05)
- Asia > Middle East > Jordan (0.05)
- (5 more...)
Detect-Localize-Repair: A Unified Framework for Learning to Debug with CodeT5
Bui, Nghi D. Q., Wang, Yue, Hoi, Steven
Automated software debugging is a crucial task for improving the productivity of software developers. Many neural-based techniques have been proven effective for debugging-related tasks such as bug localization and program repair (or bug fixing). However, these techniques often focus only on either one of them or approach them in a stage-wise manner, ignoring the mutual benefits between them. In this work, we propose a novel unified \emph{Detect-Localize-Repair} framework based on a pretrained programming language model CodeT5 to seamlessly address these tasks, named CodeT5-DLR. Specifically, we propose three objectives to adapt the generic CodeT5 for debugging: a bug detection objective to determine whether a given code snippet is buggy or not, a bug localization objective to identify the buggy lines, and a program repair objective to translate the buggy code to its fixed version. We evaluate it on each of these tasks and their combined setting on two newly collected line-level debugging datasets in Java and Python. Extensive results show that our model significantly outperforms existing baselines from both NLP and software engineering domains.
- North America > United States > New York (0.04)
- Asia (0.04)
CURE: Code-Aware Neural Machine Translation for Automatic Program Repair
Jiang, Nan, Lutellier, Thibaud, Tan, Lin
Automatic program repair (APR) is crucial to improve software reliability. Recently, neural machine translation (NMT) techniques have been used to fix software bugs automatically. While promising, these approaches have two major limitations. Their search space often does not contain the correct fix, and their search strategy ignores software knowledge such as strict code syntax. Due to these limitations, existing NMT-based techniques underperform the best template-based approaches. We propose CURE, a new NMT-based APR technique with three major novelties. First, CURE pre-trains a programming language (PL) model on a large software codebase to learn developer-like source code before the APR task. Second, CURE designs a new code-aware search strategy that finds more correct fixes by focusing on compilable patches and patches that are close in length to the buggy code. Finally, CURE uses a subword tokenization technique to generate a smaller search space that contains more correct fixes. Our evaluation on two widely-used benchmarks shows that CURE correctly fixes 57 Defects4J bugs and 26 QuixBugs bugs, outperforming all existing APR techniques on both benchmarks.
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair
Chen, Zimin, Kommrusch, Steve, Tufano, Michele, Pouchet, Louis-Noël, Poshyvanyk, Denys, Monperrus, Martin
This paper presents a novel end-to-end approach to program repair based on sequence-to-sequence learning. We devise, implement, and evaluate a system, called SequenceR, for fixing bugs based on sequence-to-sequence learning on source code. This approach uses the copy mechanism to overcome the unlimited vocabulary problem that occurs with big code. Our system is data-driven; we train it on 35,578 samples, carefully curated from commits to open-source repositories. We evaluate it on 4,711 independent real bug fixes, as well on the Defects4J benchmark used in program repair research. SequenceR is able to perfectly predict the fixed line for 950/4711 testing samples, and find correct patches for 14 bugs in Defects4J. It captures a wide range of repair operators without any domain-specific top-down design.
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > Colorado (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)